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Abstract 

We study the following generalized matrix rank estimation problem: given an n x n ma¬ 
trix and a constant c > 0, estimate the number of eigenvalues that are greater than c. In the 
distributed setting, the matrix of interest is the sum of m matrices held by separate machines. 
We show that any deterministic algorithm solving this problem must communicate fl(n 2 ) bits, 
which is order-equivalent to transmitting the whole matrix. In contrast, we propose a ran¬ 
domized algorithm that communicates only 0(n) bits. The upper bound is matched by an 
f2(n) lower bound on the randomized communication complexity. We demonstrate the practical 
effectiveness of the proposed algorithm with some numerical experiments. 


1 Introduction 

Given a parameter c > 0, the generalized rank of an nxn positive semidefinite matrix A corresponds 
to the number of eigenvalues that are larger than c. It is denoted by rank(A, c), with the usual 
rank corresponding to the special case c = 0. Estimating the generalized rank of a matrix is useful 
for many applications. In the context of large-scale principal component analysis (PCA) [11, 15], 
it is overly expensive to compute the full eigendecomposition before deciding when to truncate 
it. Thus, an important first step is to estimate the rank of the matrix of interest in order to 
determine how many dimensions will be sufficient to describe the data. The rank also provides useful 
information for determining the tuning parameter of robust PCA [4] and collaborative filtering 
algorithms [26, 24], In the context of numerical linear algebra, a number of eigensolvers [27, 23, 25] 
for large-scale scientific applications are based on divide-and-conquer paradigms. It is a prerequisite 
of these algorithms to know the approximate number of eigenvalues located in a given interval. 
Estimating the generalized rank of a matrix is also needed in the context of sampling-based methods 
for randomized numerical linear algebra [13, 21]. For these methods, the rank of a matrix determines 
the number of samples required for a desired approximation accuracy. 

Motivated by large-scale data analysis problems, in this paper we study the generalized rank 
estimation problem in a distributed setting, in which the matrix A can be decomposed as the the 
sum of m matrices 

m 

A:=J2^ (!) 

i =1 


l 


where each matrix Ai is stored on a separate machine i. Thus, a distributed algorithm needs to 
communicate between m machines to perform the estimation. There are other equivalent formula¬ 
tions of this problem. For example, suppose that machine i has a design matrix X % € M. nxNi and 
we want to determine the rank of the aggregated design matrix 

X := (X ly X 2 ,..., X m ) € R nxN where N := N t . 

Recall that the singular values of matrix X are equal to the square root of the eigenvalues of the 
matrix XX T . If we define Ai := X, t Xf. then equation (1) implies that 

m m 

A = Y j A i = J2 XiXT = XX T . 

1=1 1=1 

Thus, determining the generalized rank of the matrix X reduces to the problem of determining the 
rank of the matrix A. In this paper, we focus on the formulation given by equation (1). 

The standard way of computing the generalized matrix rank, or more generally of computing 
the number of eigenvalues within a given interval, is to exploit Sylvester’s law of inertia [12]. 
Concretely, if the matrix A — cl admits the decomposition A — cl = LDL T , where L is unit lower 
triangular and D is diagonal, then the number of eigenvalues of matrix A that are greater than c is 
the same as the number of positive entries in the diagonal of D. While this method yields an exact 
count, in the distributed setting it requires communicating the entire matrix A. Due to bandwidth 
limitations and network delays, the 0(n 2 ) communication cost is a significant bottleneck on the 
algorithmic efficiency. For a matrix of rank r, the power method [12] can be used to compute 
the top r eigenvalues, which reduces the communication cost to 0(rn). However, this cost is still 
prohibitive for moderate sizes of r. Recently, Napoli et al. [10] studied a more efficient randomized 
approach for approximating the eigenvalue counts based on Chebyshev polynomial approximation 
of high-pass filters. When applying this algorithm to the distributed setting, the communication 
cost is @{pn), where p is the degree of Chebyshev polynomials. However, the authors note that 
polynomials of high degree can be necessary. 

In this paper, we study the communication complexity of distributed algorithms for the prob¬ 
lem of generalized rank estimation, in both the deterministic and randomized settings. We estab¬ 
lish upper bounds by deriving practical, communication-efficient algorithms, and we also establish 
complexity-theoretic lower bounds. Our first main result shows that no deterministic algorithm is 
efficient in terms of communication. In particular, communicating H(n 2 ) bits is necessary for all 
deterministic algorithms to approximate the matrix rank with constant relative error. That such 
algorithms cannot be viewed as efficient is due to the fact that by communicating 0(n 2 ) bits, we are 
able to compute all eigenvalues and the corresponding eigenvectors. In contrast to the inefficiency 
of deterministic algorithms, we propose a randomized algorithm that approximates matrix rank by 
communicating 0(n) bits. When the matrix is of rank r, the relative approximation error is 1 /y/r. 
Under the same relative error, we show that H(n) bits of communication is necessary, establishing 
the optimality of our algorithm. This is in contrast with the fl(rn) communication complexity 
lower bound for randomized PCA [16]. The difference shows that estimating the eigenvalue count 
using a randomized algorithm is easier than estimating the top r eigenpairs. 

The research on communication complexity has a long history, dating back to the seminal work 
of Yao [30] and Abelson [1], Characterizing the communication complexity of linear algebraic 
operations is a fundamental question. For the problem of rank testing, Chu and Schnitger [5, 6] 
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prove the 0 (n 2 ) communication complexity lower bound for deterministically testing the singularity 
of integer-valued matrices. A successful algorithm for this task is required to distinguish two types of 
matrices—the singular matrices and the non-singular matrices with arbitrarily small eigenvalues— 
a requirement that is often too severe for practical applications. Luo and Tsitsiklis [20] prove an 
P(n 2 ) lower bound for computing one entry of A" 1 , applicable to exact algorithms (with no form 
of error allowed). In contrast, our deterministic lower bound holds even if we force the non-zero 
eigenvalues to be bounded away from zero and allow for approximation errors, making it more 
widely applicable to the inexact algorithms used in practice. For randomized algorithms, Li et 
al. [28, 19] prove f 'tin 2 ) lower bounds for the problems of rank testing, computing a matrix inverse, 
and solving a set of linear equations over finite fields. To the best of our knowledge, it is not known 
whether the same lower bounds hold for matrices in the real held. In other related work, Clarkson 
and Woodruff [7] give an P(r 2 ) space lower bound in the streaming model for distinguishing between 
matrices of rank r and r — 1. However, such a space lower bound in the streaming model does not 
imply a communication complexity lower bound in the two-way communication model studied in 
this paper. 

2 Background and problem formulation 

In this section, we begin with more details on the problem of estimating generalized matrix ranks, 
as well as some background on communication complexity. 

2.1 Generalized matrix rank 

Given an n x n positive semidehnite matrix A, we use g\ (A) > ( 72 (A) > ■ ■ ■ > a n (A) > 0 to denote 
its ordered eigenvalues. For a given constant c > 0, the generalized rank of order c is given by 

n 

rank(A, c) = \cr k (A) > c], ( 2 ) 

k =1 

where I[cjfc(A) > c] is a 0-1-valued indicator function for the event that Ofc(A) is larger than c. Since 
rank(A, 0) is equal to the usual rank of a matrix, we see the motivation for using the generalized rank 
terminology. We assume that ||A || 2 = 01 (A) < 1 so that the problem remains on a standardized 
scale. 

In an m-machine distributed setting, the matrix A can be decomposed as a sum A = YITLi Aj, 
where the n x n matrix Aj is stored on machine i. We study distributed protocols, to be speci¬ 
fied more precisely in the following section, in which each machine i performs local computation 
involving the matrix Aj, and the machines then exchange messages so to arrive at an estimate 
r(A) € [n] := {0,... ,n}. Our goal is to obtain an estimate that is close to the rank of the matrix 
in the sense that 


(1 — d)rank(A, ci) < r(A) < (1 + h)rank(A, C 2 ), (3) 

where ci > C 2 > 0 and 5 € [0,1) are user-specified constants. The parameter 5 € [0,1) upper 
bounds the relative error of the approximation. The purpose of assuming different thresholds ci 
and C 2 in bound (3) is to handle the ambiguous case when the matrix A has many eigenvalues 
smaller but very close to ci. If we were to set ci = C 2 , then any estimator r(A) would be strictly 
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prohibited to take these eigenvalues into account. However, since these eigenvalues are so close 
to the threshold, distinguishing them from other eigenvalues just above the threshold is obviously 
difficult (but for an uninteresting reason). Setting ci > C 2 allows us to expose the more fundamental 
sources of difficulty in the problem of estimating generalized matrix ranks. 

2.2 Basics of communication complexity 

To orient the reader, here we provide some very basic background on communication complexity 
theory; see the books [18, 17] for more details. The standard set-up in multi-party communication 
complexity is as follows: suppose that there are m players (equivalently, agents, machines, etc.), 
and for i € {1,... ,m}, player i holds an input string Xi. In the standard form of communication 
complexity, the goal is to compute a joint function F(x i,... ,x m ) of all m input strings with as 
little communication between machines as possible. In this paper, we analyze a communication 
scheme known as the public blackboard model, in which each player can write messages on a common 
blackboard to be read by all other players. A distributed protocol n consists of a coordinated order 
in which players write messages on the blackboard. Each message is constructed from the player’s 
local input and the earlier messages on the blackboard. At the end of the protocol, some player 
outputs the value of F{x i,..., x m ) based on the information she collects through the process. The 
communication cost of a given protocol n, which we denote by C(n), is the maximum number of 
bits written on the blackboard given an arbitrary input. 

In a deterministic protocol, all messages must be deterministic functions of the local input and 
previous messages. The deterministic communication complexity computing function F, which we 
denote by F>(F), is defined by 

V(F) : = min jc(n) : n is a deterministic protocol that correctly computes F^. (4) 

In other words, the quantity F>(F) is the communication cost of the most efficient deterministic 
protocol. 

A broader class of protocols are those that allow some form of randomization. In the public 
randomness model, each player has access to an infinite-length random string, and their messages 
are constructed from the local input, the earlier messages and the random string. Let V e (F) be the 
set of randomized protocols that correctly compute the function F on any input with probability 
at least 1 — e. The randomized communication complexity of computing function F with failure 
probability e is given by 


U e (F) : = min{c(n) | n € V e (F)Y (5) 

In the current paper, we adopt the bulk of the framework of communication complexity, but 
with one minor twist in how we define “correctness” in computing the function. For our problem, 
each machine is a player, and the i th player holds the matrix A{. Our function of interest is given 
by F(A \,..., A m ) = rank(^j)™ x Ai). The public blackboard setting corresponds to a broadcast-free 
model, in which each machine can send messages to a master node, then the master node broadcasts 
the messages to all other machines without additional communication cost. 

Let us now clarify the notion of “correctness” used in this paper. In the standard communication 
model, a protocol n is said to correctly compute the function F if the output of the protocol is 
exactly equal to F(A±,... ,A m ). In this paper, we allow approximation errors in the computation, 
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as specified by the parameters (ci,C 2 ), which loosen the matrix rank to the generalized matrix 
ranks, and the tolerance parameter 5 S (0,1). More specifically, we say: 

Definition 1 . A protocol H correctly computes the rank of the matrix A up to tolerances (ci,C2,<5) 
if the output r{A) satisfies inequality (3). 

Given this definition of correctness, we denote the deterministic communication complexity of 
the rank estimation problem by D(ci,C2,5), and the corresponding randomized communication 
complexity by 7*h(ci, C 2 , S). The goal of this paper is to study these two quantities, especially their 
dependence on the dimension n of matrices. 

In addition to allowing for approximation error, our analysis—in contrast to most classical 
communication complexity—allows the input matrices to take real values. However, doing 

so does not make the problem substantially harder. Indeed, in order to approximate the matrices 
in elementwise ^oo-norm up to r rounding error, it suffices to discretize each matrix entry using 
0(log(l/r)) bits. As we discuss in more detail in the sequel, this type of discretization has little 
effect on the communication complexity. 

3 Main results and their consequences 

This section is devoted to statements of our main results, as well as discussion of some of their 
consequences. 

3.1 Bounds for deterministic algorithms 

We begin by studying the communication complexity of deterministic algorithms. Here our main 
result shows that the trivial algorithm—the one in which each machine transmits essentially its 
whole matrix—is optimal up to logarithmic factors. In the statement of the theorem, we assume 
that the n-dimensional matrix A is known to have rank in the interval 1 [r, 2r] for some integer 
r < n/4. 

Theorem 1 . For matrices A with rank in the interval [r, 2r\: 

(a) For all 0 < C 2 < ci and 5 G (0,1), we have V(ci,C 2 ,S) = o[mrn log ( ///// )) • 

(b) For two machines m = 2, constants 0 < C 2 < ci < 1/20 and 6 € (0,1/12), we have 
P(ci,c 2 ,5) = kl(rn). 

When the matrix A has rank r that grows proportionally with its dimension n, the lower bound 
in part (b) shows that deterministic communication complexity is surprisingly large: it scales as 
@(n 2 ), which is as large as transmitting the full matrices. Up to logarithmic factors, this scaling is 
matched by the upper bound in part (a). It is proved by analyzing an essentially trivial algorithm: 
for each index i = 2,... ,m, machine i encodes a reduced rank representation of the matrix Ai, 
representing each matrix entry by log 2 ^ ) bits. It sends this quantized matrix A* to the 

first machine. Given these received messages, the first machine then computes the matrix sum 
A := A\ + Ya= 2 A-i, and it outputs f(A) to be the largest integer k such that o> 0 (A) > (ci + C2)/2. 

1 We use an interval assumption, as the problem becomes trivial if the rank is fixed exactly. 
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On the other hand, in order to prove the lower bound, we consider a two-party rank testing 
problem. Consider two agents holding matrices A\ and A 2 , respectively, such that the matrix sum 
A := A\ + A 2 has operator norm at most one. Suppose that exactly one of the two following 
conditions are known to hold: 

• the matrix A has rank r, or 

• the matrix A has rank between ( -A- and 2 r, and in addition its ( 6r/5) th eigenvalue is lower 
bounded as ct 6 r (A) > 

The goal is to decide which case is true by exchanging the minimal number of bits between the 
two agents. Denoting this problem by RankTest, the proof of part (a) proceeds by showing first 
that D(RankTest) = fl(rn), and then reducing from the RankTest problem to the matrix rank 
estimation problem. See Section 4.1 for the proof. 

3.2 Bounds for randomized algorithms 

We now turn to the study of randomized algorithms, for which we see that the communication 
complexity is substantially lower. In Section 3.2.1, we propose a randomized algorithm with 0{n) 
communication cost, and in Section 3.2.3, we establish a lower bound that matches this upper 
bound in various regimes. 


3.2.1 Upper bounds via a practical algorithm 

In this section, we present an algorithm based on uniform polynomial approximations for estimating 
the generalized matrix rank. Let us first provide some intuition for the algorithm before defining 
it more precisely. For a fixed pair of scalars c\ > C 2 > 0, consider the function H C1 , C2 : R —>• [0,1] 
given by 


H Cl ,c 2 i x ) 


1 if x > ci 

< 0 if x < c 2 

X ~ C2 otherwise. 


( 6 ) 


As illustrated in Figure 1, it is a piecewise linear approximation to a step function. The squared 
function C2 is useful in that it can be used to sandwich the generalized ranks of a matrix A. In 
particular, given a positive semidefinite matrix A with ordered eigenvalues a\ (A) > ( 72 (A) > ... > 
cr n (A) > 0 , observe that we have 


rank(A,ci) < ^ H^^a^A)) < rank(A,c 2 ). (7) 

i— 1 


Our algorithm exploits this sandwich relation in estimating the generalized rank. 

In particular, suppose that we can find a polynomial function / : R —>• K such that / ~ H C1:C2 , 
and which is extended to a function on the cone of PSD matrices in the standard way. Observe that 
if <7 is an eigenvalue of A, then the spectral mapping theorem [2] ensures that /(<r) is an eigenvalue 
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value of x 


Figure 1: An illustration of the function x H>• H Cl>C2 (x) with C\ = 0.5 and c-i = 0.1. 


of f(A). Consequently, letting g ~ N(0,I n 
relation 


be a standard Gaussian vector, we have the useful 


E 


ii/(^)9iif| = E f « A ))« E 

i=1 


( 8 ) 


i— 1 


Combined with the sandwich relation (7), we see that a polynomial approximation / to the function 
H c1 , C2 can be used to estimate the generalized rank. 

If / is a polynomial function of degree p, then the vector f(A)g can be computed through p 
rounds of communication. In more detail, in one round of communciation, we can first compute 
the matrix-vector product Ag = Yl'i'Li Aig. Given the vector Ag , a second round of communication 
suffices to compute the quantity A 2 g. Iterating a total of p times, the first machine is equipped 
with the collection of vectors {g, Ag, A 2 g ,... , A p g}, from which it can compute f(A)g. 

Let us now consider how to obtain a suitable polynomial approximation of the function 
The most natural choice is a Chebyshev polynomial approximation of the first kind: more precisely, 
since L7 Cl ,c 2 is a continuous function with bounded variation, classical theory [22, Theorem 5.7] 
guarantees that the Chebyshev expansion converges uniformly to H c1)C2 over the interval [0,1]. 
Consequently, we may assume that there is a finite-degree Chebyshev polynomial q\ of the first 
kind such that 


sup \qi(x) - H CuC2 (x )| <0.1. 
xe[o,i] 


(9a) 


By increasing the degree of the Chebyshev polynomial, we could reduce the approximation 
error (set to 0.1 in the expansion (9a)) to an arbitrarily small level. However, a very high degree 
could be necessary to obtain an arbitrary accuracy. Instead, our strategy is to start with the 
Chebyshev polynomial q± that guarantees the 0.1-approximation error (9a), and then construct a 
second polynomial q 2 such that the composite polynomial function / = q 2 °qi has an approximation 
error, when measured over the intervals [0, C 2 ] and [ci, 1] of interest, that converges linearly in the 
degree of function /. More precisely, consider the polynomial of degree 2p + 1 given by 


Q 2 (x) = 


1 


B(p+l,p + l) 


rx 

/ t p ( 1 
Jo 


— t) p dt where ■) is the Beta function. 


(9b) 
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(a) Thresholds ( 01 , 02 ) = (0.2, 0.1) 



(b) Thresholds ( 01 , 02 ) = (0.02,0.01) 


Figure 2. Comparison of the composite polynomial approximation in Algorithm 2 with the Cheby- 
shev polynomial expansion. The error is measured with the foo-norm on the interval [0, C 2 ] U [ci, 1]. 
The composite polynomial approximation achieves a linear convergence rate as the degree is increased, 
while the Chebyshev expansion converges at a much slower rate. 


Lemma 1. Consider the composite polynomial f(x) := g 2 (<?i(aO); where the base polynomials q\ 
and q 2 were previously defined in equations (9a) and (9b) respectively. Then f(x) € [0,1] for all 
x € [0,1]. and moreover 

\f(x) - H C1jC2 (x)\ < 2~ p for all x G [0, c 2 ] U [ci, 1]. (10) 

See Appendix A for the proof. 

Figure 2 provides a comparison of the error in approximating H CltC2 for the standard Chebyshev 
polynomial and the composite polynomial. In order to conduct a fair comparison, we show the 
approximations obtained by Chebyshev and composite polynomials of the same final degree, and we 
evaluate the ^oo-norm approximation error on interval [0, C 2 ]U[ci, 1]—namely, for a given polynomial 
approximation h, the quantity 

Error(fo) := sup \h(x) — H cltC2 (x)\. 
xe[o,c 2 ]u[ci,i] 

As shown in Figure 2 shows, the composite polynomial function achieves a linear convergence rate 
with respect to its degree. In contrast, the convergence rate of the Chebyshev expansion is sub- 
linear, and substantially slower than that of the composite function. The comparison highlights 
the advantage of our approach over the method only based on Chebyshev expansions. 

Given the composite polynomial / = (72 0 qi, we first evaluate the vector f(A)g in a two-stage 
procedure. In the first stage, we evaluate q\(A)g, q\{A)g 1 ..., q* p+1 (A)g using the Clenshaw recur¬ 
rence [8], a procedure proven to be numerically stable [22]. The details are given in Algorithm 1. 
In the second stage, we substitute the coefficients of 32 so as to evaluate q 2 (qi(A))b. The overall 
procedure is summarized in Algorithm 2. 

The following result provides a guarantee for the overall procedure (combination of Algorithm 1 
and Algorithm 2) when run with degree p = |~log 2 (2n)]: 

Theorem 2. For any 0 < <5 < 1, with probability at least 1 — 2 exp iai ^ j4 ’ Cl ^ ; the output of 

Algorithm 2 satisfies the bounds 

(1 — <f)rank(A, ci) — 1 < r(A) < (1 + <5)(rank(A, c 2 ) + 1). 
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( 11 ) 



















Algorithm 1: Evaluation of Chebyshev Polynomial 
Input: m machines hold A\, A 2 , . . ■ ,A m £ M nxn ; vector v € M d ; Chebyshev polynomial 
expansion q{x) = \ciqTq(x) + Ya=i a i T i( x )- 
Output: matrix-vector product q(A)v. 

1. Initialize vector bd+ 1 = bd+ 2 = 0 € M n . 

2. For j = d ,..., 1,0: the first machine broadcasts bj+\ to all other machines. Machine i 
computes 1 and sends it back to the first machine. The first machine computes 

m 

bj '■= (^i^Aibj+^j - 2b j+ i - b j+2 + apv. 
i= 1 


3. Output ^(aov + b\ — 63 ); 


Algorithm 2: Randomized Algorithm for Rank Estimation 
Input: Each of m machines hold matrices A\, A 2 ,.. ■ ,A m £ R raxn . Tolerance parameters 
(ci, C 2 ), polynomial degree p, and number of repetitions T. 

1. (a) Find a Chebyshev expansion q\ of the function H cltC2 satisfying the uniform bound (9a). 
(b) Define the degree 2p + 1 polynomial function q 2 by equation (9b). 

2. (a) Generate a random Gaussian vector g ~ N(0, I n xn)- 

(b) Apply Algorithm 1 to compute qi(A)g, and sequentially apply the same algorithm to 
compute qj{A)g,q* p+1 (A)g. 

(c) Evaluate the vector y := f(A)g = q 2 (qi(A))g on the first machine. 

3. Repeat Step 2 for T times, obtaining a collection of n -vectors {yi,... ,yx}, and output the 

estimate r(A) = ^ 1 II ViWl- 


Moreover, we have the following upper bound on the randomized communication complexity of 
estimating the generalized matrix rank: 

Tie (ci, c 2 , l/y / rank(A, ci)^ =0{mn). ( 12 ) 

We show in Section 3.2.3 that the upper bound (12) is unimprovable up to the logarithmic pre¬ 
factors. For now, let us turn to the results of some numerical experiments using Algorithm 2, which 
show that in addition to being an order-optimal algorithm, it is also practically useful. 

3.2.2 Numerical experiments 

Given m = 2 machines, suppose that machine i (for i = 1,2) receives A, = 1000 data points of 
dimension n = 1000. Each data point x is independently generated as x = a+e, where a ~ JV(0, AS) 
and e ~ JV(0, cr 2 / nxn ) are random Gaussian vectors. Here E £ M nxn is a low-rank covariance matrix 
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Eigenvalue 


(a) Eigenvalue Distribution 



(b) Rank Estimation Error 


Figure 3. Panel (a): distribution of eigenvalues of matrix A. Panel (b): mean squared error of rank 
estimation versus the number of iterations for the baseline method by Napoli et al. [10], and three 
versions of Algorithm 2 (with parameters p G {0,1, 5}). 


of the form £ := ^T =1 UiuJ , where {uj}[ =1 are an orthonormal set of vectors in M n drawn uniformly 
at random. The goal is to estimate the rank r from the observed N\ + IV 2 = 2000 data points. 

Let us now describe how to estimate the rank using the covariance matrix of the samples. Notice 
that E[xx t ] = A 2 £ + cr 2 I nxn , of which there are r eigenvalues equal to A + a 2 and the remaining 
eigenvalues are equal to a 2 . Letting Xij G M n denote the j-th data point received by machine i, 
that machine can compute the local sample covariance matrix 


M = 


1 


Ni 

Nl + N2 H Xi >i x lv 

3 =1 


for i = 1, 2. 


The full sample covariance matrix is given by the sum A := A\ + A 2 , and its rank can be estimated 
using Algorithm 2. 

In order to generate the data, we choose the parameters r = 100, A = 0.4 and a 2 = 0.1. These 
choices motivate the thresholds ci = A + a 2 = 0.5 and C 2 = o 2 = 0.1 in Algorithm 2. We illustrate 
the behavior of the algorithm for three different choices of the degree parameter p —specifically, 
p G {0,1,5}—and for a range of repetitions T € {1,2,..., 30}. Letting r(A) denote the output of 
the algorithm, we evaluate the mean squared error, E[(r(A) — r) 2 ], based on 100 independent runs 
of the algorithm. 

We plot the results of this experiment in Figure 3. Panel (a) shows the distribution of eigenvalues 
of the matrix A. In this plot, there is a gap between the large eigenvalues generated by the low-rank 
covariance matrix £, and small eigenvalues generated by the random Gaussian noise, showing that 
the problem is relatively easy to solve in the centralized setting. Panel (b) shows the estimation 
error achieved by the communication-efficient distributed algorithm; notice how the estimation 
error stabilizes after T = 30 repetitions or iterations. We compare our algorithm for p G {0,1,5}, 
corresponding to polynomial approximations with degree in {4,12,44}. For the case p = 0, the 
polynomial approximation is implemented by the Chebyshev expansion. For the case p = 1 and 
p = 5, the approximation is achieved by the composite function /. As a baseline method, we 
also implement Napoli et aids algorithm [10] in the distributed setting. In particular, their method 
replaces the function / in Algorithm 2 by a Chebyshev expansion of the high-pass filter I(x > G 1 i} C2 ). 
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It is observed that both the Chebyshev expansion with p = 0 and the baseline method incur a large 
bias in the rank estimate, while the composite function’s estimation errors are substantially smaller. 
After T = 30 iterations, Algorithm 2 with p = 1 achieves a mean squared error close to 10, which 
means that the relative error of the estimation is around 3%. 

3.2.3 Lower Bound 

It is natural to wonder if the communication efficiency of Algorithm 2 is optimal. The following 
theorem shows that, in order to achieve the same 1 /y/r relative error, it is necessary to send f 2 (to) 
bits. As in our upper bound, we assume that the matrix A satisfies the spectral norm bound 
||AJI 2 < 1. Given an arbirary integer r in the interval [16,n/4], suppose that the generalized matrix 
ranks satisfy the sandwich relation r < rank(A, ci) < rank(A, C 2 ) < 2 r. Under these conditions, we 
have the following guarantee: 

Theorem 3. For any ci, C 2 satisfying c± < 2c2 < 1 and any e < eo for some numerical constant eo, 
we have 

K e (ci,c 2 , l/y/r^j = fi(ra). (13) 

See Section 4.3 for the proof of this lower bound. 

According to Theorem 3, for matrices with true rank in the interval [16, n/2], the communication 
complexity for estimating the rank with relative error 1 / y/r is lower bounded by Q(n). This lower 
bound matches the upper bound provided by Theorem 2. In particular, choosing r = 16 yields the 
worst-case lower bound 

ft e (ci,c 2 ,1/4) = G(n), 

showing that Q(n) bits of communication are necessary for achieving a constant relative error. This 
lower bound is not trivial relative to the coding length of the correct answer: given that the matrix 
rank is known to be between r and 2 r, this coding length scales only as fi(logr). 

There are several open problems suggested by the result of Theorem 3. First, it would be 
interesting to strengthen the lower bound (13) from G(n) to H(mn), incorporating the natural 
scaling with the number of machines m. Doing so requires a deeper investigation into the multi¬ 
party structure of the problem. Another open problem is to lower bound the communication 
complexity for arbitrary values of the tolerance parameter 5, say as small as 1/r. When 5 is very 
small, communicating 0{mn 2 ) bits is an obvious upper bound, and we are not currently aware of 
better upper bounds. On the other hand, whether it is possible to prove an fl(n 2 ) lower bound for 
small 5 remains an open question. 

4 Proofs 

In this section, we provide the proofs of our main results, with the proofs of some more technical 
lemmas deferred to the appendices. 

4.1 Proof of Theorem 1 

Let us begin with our first main result on the deterministic communication complexity of the 
generalized rank problem. 
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4.1.1 Proof of lower bound 

We first prove the lower bound stated in part (a) of Theorem 1. Let us recall the RankTest problem 
previously described after the statement of Theorem 1. Alice holds a matrix A\ £ W nxn and Bob 
holds a matrix A 2 £ M nxri such that the matrix sum A := A\ + A 2 has operator norm at most one. 
Either the matrix A has rank r, or the matrix A has rank between and 2r, and in addition its 
6r/5 eigenvalue is lower bounded as <j&r{A) > The RankTest problem is to decide which of 
these two mutually exclusive alternatives holds. The following lemma provides a lower bound on 
the deterministic communication complexity of this problem: 

Lemma 2. For any r < n/4 ; we have D(RankTest) = fl(rn). 

We use Lemma 2 to lower bound D(c\, C2,5), in particular by reducing to it from the RankTest 
problem. Given a RankTest instance, since there are m > 2 machines, the first two machines can 
simulate Alice and Bob, holding A\ and A 2 respectively. All other machines hold a zero matrix. 
Suppose that c\ < 1/20 and S < 1/12. If there is an algorithm achieving the bound (3), then if 
A = A\ + A 2 is of rank r, then 

/ 1 \ i3 r 

f(A) < (1 + <5)rank(A,c 2 ) < (1 + — Jr = (14a) 

Otherwise, the //-th eigenvalue of A is greater than 1/20, so that 

/ 1 \ f-jr* 1 1 T* 1 S?” 

r(A) > (1 - 5)rank(A, Cl ) > (l - -)■- = — > —. (14b) 

In conjunction, inequality (14a) and (14b) show that we can solve the RankTest problem by testing 
whether or not r(A) < -jr/. Consequently, the deterministic communication complexity D(c±, c 2 , (5) 
is lower bounded by the communication complexity of RankTest. 

In order to complete the proof of Theorem 1(a), it remains to prove Lemma 2, and we do so using 
a randomized construction. Let us say that a matrix Q £ M rxn is sampled from the orthogonal 
ensemble if it is sampled in the following way: let U £ M nxn be a matrix uniformly sampled from 
the group of orthogonal matrices, then Q is the sub-matrix consisting of the first r rows of U. We 
have the following claim. 

Lemma 3. Given matrices Q\ £ M rxn and Q 2 £ M rxn independently sampled from the orthogonal 
ensemble, we have a6r(QjQi + Q 2 Q 2 ) > jq with probability at least 1 — e~wo . 

See Appendix B for the proof. 


Taking Lemma 3 as given, introduce the shorthand N = . Suppose that we independently 

sample 2 N matrices of dimensions rxn from the orthogonal ensemble. Since there are 2 N (2 N —1)/2 
distinct pairs of matrices in our sample, the union bound in conjunction with Lemma 3 implies 
that 


(T6r(QjQi +QjQj) > - 

5 J 1UJ 


> 1 - 


2 n ( 2 n - 1 ) 


■ exp 


3 rn\ 
100 / 


(15) 
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With our choice of N, it can be verified that the right-hand side of inequality (15) is positive. Thus, 
there exists a realization of orthogonal matrices Q i,... , Q 2 n € M rxn such that for all i / j we have 

<J6r{QjQi + QjQj ) > jfl. 

We use this collection of orthogonal matrices in order to reduce the classical Equality problem 
to the rank estimation problem. In the Equality problem, Alice has a binary string x\ € {0,1}^ 
and Bob has another binary string x 2 € {0,1}^, and their goal is to compute the function 


Equal ity (aq, x 2 ) 


1 if x\ = x 2 ; 
0 otherwise; 


It is well-known [17] that the deterministic communication complexity of the Equality problem is 
D(Equality) = N + 1. 

In order to perform the reduction, given binary strings aq and x 2 of length N , we construct 
two matrices A\ and A 2 such that their sum A = A\ + A 2 has rank r if and only if x\ = x 2 . Since 
both aq and x 2 are of length N, each of them encodes an integer between 1 and 2 N . Defining 


A _ Qx 1 Qx 1 

jUl — 2 


and Ao = 


Qx 2 Qx 2 


the triangle inequality guarantees that 


2 < IIII2 + HA2II2 — 


WQxiQx! II2 + \\QxoQx2W2 


< 1, 


showing that A satisfies the required operator norm bound. If aq = x 2 , then A = Q X1 Q X1 , which 
is a matrix of rank r. If aq 7^ x 2 , then by our construction of Q X1 and Q X2 i we know that the 
matrix A has rank between ( -A- and 2 r and moreover that cJ6 r(A) > Thus, we can output 
Equality(xi, x 2 ) = 1 if we detect the rank of matrix A to be r and output Equality(xi,x 2 ) = 0 
otherwise. Using this protocol, the Equality evaluation is always correct. As a consequence, 
the deterministic communication complexity of RankTest is lower bounded by that of Equality. 
Finally, noting that D(RankTest) > D(Equality) = N + 1 > completes the proof. 


4.1.2 Proof of upper bound 

In order to prove the upper bound stated in part (b), we analyze the algorithm described following 
the theorem statement. If the matrix A = Yl[=i has rank at most 2r, then given the PSD 
nature of the component matrices, each matrix Aj also has rank at most 2r. Consequently, we can 
find a factorization of the form Aj = B{Bj where Bi € M nxr . Let Bi be a quantization of the 
matrix Bi, allocating log 2 (y~yy) bits to each entry. Note that each machine must transmit at 

most rn log 2 ^ -c" ) ™ 01 ’d er t° convey the quantized matrix Bi. 

Let us now analyze the approximation error. By our choice of quantization, we have 


I-Bi — Bi| op < HI Bi — Bi HI F < \/2 


-Cl - c 2 Cl - c 2 


12 mrn 6 m \/2rn 

Defining Aj = Bill; 1 we have 

III A - Ai ||| F < |||Bi - Bi ||| F V / 2rn(j||Bj|| op + |||Bj||| op ) 


ci - c 2 
6 m 

ci - c 2 
2m ’ 


2 + 


ci - c 2 


6 m \j2rn 
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where the final inequality follows as lone as C1 f 2 < 1. 

6mv2rn 

Consequently, the sum A = Ai satisfies the bound 

\\A - A\\f < Y, 11 ^* “ Ai \\f < ^ 1 2 2 ^ - 

i =2 

Applying the Wielandt-Hoffman inequality [14] yields the upper bound 

|o- fc (A) - < 7 fc (A)| < ||A - A||f < (ci - c 2 )/2 for all A: € [nj. (16) 

Recalling that r(A) is the largest integer k such that (Jk{A ) > (ci + c 2 )/ 2 , inequality (16) implies 
that 


(ci + c 2 )/2 > cr^ )+ 1 (A) > ct f(a)+ 1 (A) - (ci - c 2 )/ 2 , 

which implies oy^+^A) < cj. This upper bound verihes that f(A) > r(A, ci). On the other hand, 
inequality (16) also yields 

(ci + c 2 )/2 < ar(A)(A) < <?r(A)(A) + (ci - c 2 )/ 2 , 

which implies a^r^(A) > c 2 and r(A) < r(A,c 2 ). Combining the above two inequalities yields the 
claim (3). 

4.2 Proof of Theorem 2 

We split the proof into two parts, corresponding to the upper bounds (11) and (12) respectively. 

Proof of upper bound (11): Let A j be the j’-th largest eigenvalue of A and let Vj be the 
associated eigenvector. Let function / be defined as f ( x ) := <72(91 0 * 0 )- Using basic linear algebra, 
we have 


\\y\\l = Yh f 2 ( x i)( v Js) 2 - ( 17 ) 

3 = 1 

Since g is an isotropic Gaussian random vector, the random variables Zj = (vjg) 2 are i.i.d., each 
with x 2 distribution with one degree of freedom. To analyze the concentration behavior of Z 
variables, we recall the notion of a sub-exponential random variable. 

A random variable Y is called sub-exponential with parameter (<t 2 ,/3) if E[T] = 0 and the 
moment generating function is upper bounded as E[e rt ] < e t ° / 2 for all |t| < 1 /(3. The following 
lemma, proved in Appendix C, characterizes some basic properties of sub-exponential random 
variables. 

Lemma 4. (a) If Z ~ x 2 > then both Z — 1 and 1 — Z are sub-exponential with parameter (4,4). 

(b) Given an independent sequence {T*}f =1 in which Yi is sub-exponential with parameter (crf,/3i), 
then for any choice of non-negative weights {a}f =1 , the weighted sum a t Y t is sub-exponential 
with parameters {YH=i a i a fi max ie[n]( a *A})- 
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(c) IfY is sub-exponential with parameter (a 2 , {3), then 

F[Y>t\ <e~^ for all t € [0, 

We consider ||y||2 as well as the associated lower bound L = ^y™ ] k ( y4 - Cl ) / 2 (A :? )('i;J&) 2 . By parts (a) 
and (b) of Lemma 4, the variable ||y|| 2 —E[||y||2] is sub-exponential with parameter (4^™ =1 / 2 (Aj),4), 
and the variable E [L\ — L is sub-exponential with parameter (4^^ I J k ^’ Cl ' 1 / 2 (Aj),4). In order to 
apply part (c) of Lemma 4, we need upper bounds on the sum i / 2 (^*)> as we ^ as upper/lower 
bounds on the sum ^™ k ( j4 ’ Cl ) y 2 ^-). For the hrst sum, we have 

n rank(A,C2) n 

E/ 2 ( a j)= E / 2 ( a j) + E / 2 < A ;> 

j =1 3 =1 j=rank(A,c 2 )-|-l 

< rank(j4, C2) + n2~ p 

< rank( J 4, C2) + 1. (18) 


where the last two inequalities use Lemma 1 and the fact that p = |dog 2 (2n)]. For the second sum, 
using Lemma 1 implies that 

rank(v4,ci) 

rank( J 4,ci)> / 2 (Aj) > rank( J 4, ci)(l — 2 _p ) 2 

i= 1 

(i) ( a ) 

> rank(A, ci)(l — l/(2n)) 2 > rank( J 4,ci) — 1. 

where inequality (i) follows since 2 _p < l/(2n); inequality (ii) follows since (1 — l/(2n)) 2 > 1 — 1/n. 
Thus, we have 


E[||y|||] < rank(v4,c 2 ) + 1 and E[L] > rank(j4, c\) — 1. 


(19) 


Putting together the pieces, we see that ||y|||—E[||y|||] is sub-exponential with parameter (4(rank(j4, 02)+ 
1), 4) and E[L] — L is sub-exponential with parameter (4 rank(j4, ci), 4). 

Let r be the average of T independent copies of ||y||2, and let rj j be the average of T inde¬ 
pendent copies of L. By Lemma 4 (b), we know that r — E[r] is sub-exponential with parameter 
(4(rank(Al, C2)+l) /T. 4/T), and E[fy] —?l is sub-exponential with parameter (4 rank(^4, ci )/T, 4 /T). 
Plugging these parameters into Lemma 4 (c), for any 0 < 5 < 1, we find that 


P r < E[r] + d(rank(A, C2) + 1) > 1 — exp 
P ?l > E[fh] — drank(A, c\) > 1 — exp 
Combining inequalities (19), (20a), and (20b) yields 


T<5 2 (rank(Al, C2) + 1) 


32 


Td 2 rank(A, ci) 
32 


P 


(1 — <5)rank(^4, ci) — 1 < tl < r < (1 + 5)(rank(j4, C2) + 1) 


T5 rank(.A,c-i ) 

< 1 - 2e-32-. 


(20a) 

(20b) 


( 21 ) 


which completes the proof of inequality (11). 
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Proof of upper bound (12): It remains to show to establish the upper bound (12) on the 
randomized communication complexity. The subtle issue is that in a discrete message model, we 
cannot calculate f(A)g without rounding errors. Indeed, in order to make the rounding error of 
each individual message bounded by r, each machine needs 0(relog(l/r)) bits to encode a message. 
Consequently, the overall communication complexity scales as O(Tmdpn\og(l /r)), where T is the 
number of iterations of Algorithm 2; m is the number of machines, the quantities d and p are the 
degrees of q± and q 2 , and n is the matrix dimenson. With the choices given, we have d = 0(1) 
and p = O(logn). In order to make inequality (11) hold with probability at least 1 — e, the upper 
bound (21) suggests choosing T = 0(log(l/e)). 

Finally, we need to upper bound the quantity 0(log(l/r)). In order to do so, let us revisit 
Algorithm 2 to see how rounding errors affect the final output. For each integer k = 1,..., 2p + 1, 
let us denote by 5k the error of evaluating q\(A)g using Algorithm 1 . It is known [22, Chapter 
2.4.2] that the rounding error of evaluating a Chebyshev expansion is bounded by mdr. Thus, we 
have (5fc + i < ||qri(^4)||2<5’fc + mdr. Since ||< 7 i(A )||2 < 1.1 by construction, we have the upper bound 

5k < 10(l.l fc+1 - 1 )mdr. (22) 


For a polynomial of the form q-iix) = X^=o' we have y = a iQi(A)b. As a conse¬ 

quence, there is a universal constant C such that error in evaluating y is bounded by 

2p+l 2p+l 

C ^ 5i\ai\ < C' (l.l) 2p+1 mdr ^ |ai|. 

2=0 2=0 

By the definition of the polynomial q 2 and the binomial theorem, we have 


2p+l 

X] kl < 


i =0 


2 p 

B(p + l,p + l) 


2 p (2p + 1)! 

( p !) 2 " 


Putting the pieces together, in order to make the overall error small, it suffices to choose r of the 
order (mdn)~ l 2~ 4p . Doing so ensures that log(l/r) = O(p log (mdn)), which when combined with 
our earlier upper bounds on d, p and T, establishes the claim (12). 


4.3 Proof of Theorem 3 

In order to prove Theorem 3, it suffices to consider the two-player setting, since the first two 
machines can always simulate the two players Alice and Bob. Our proof proceeds via reduction 
from the 2-SUM problem [29], in which Alice and Bob have inputs (U ±,..., U r ) and (Vi,...,V r ), 
where each Ui and V\ are subsets of {1,..., L}. It is promised that for every index i G {1,..., r}, 
the intersection of Ui and V) contains at most one element. The goal is to compute the sum 
Yl\= 11 Ui 0 Vi\ up to an additive error of \/r/2. Woodruff and Zhang [29] showed that randomized 
communication complexity of the 2-SUM problem is lower bounded as £l(rL). 

We note here that when r > 16, the same communication complexity lower bound holds if we 
allow the additive error to be 2 y/r. To see this, suppose that Alice and Bob have inputs of length 
r/16 instead of r. By replicating their inputs 16 times, each of Alice and Bob can begin with an 
input of length r. Assume that by using some algorithm, they can compute the 2-SUM for the 
replicated input with additive error at most 2 y/r. In this way, they have computed the 2-SUM 
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for the original input with additive error at most y/r/8. Note that y/r/8 = \Jr/ 16/2. The lower 
bound on the 2-SUM problem implies that the communication cost of the algorithm is Q,{rL/ 16), 
which is on the same order of £l(rL). 

To perform the reduction, let L = [n/r — lj. Since r < n/2, we have L > 1. Suppose that Alice 
and Bob are given subsets (Ui, ..., U r ) and (V\ ,..., V r ), which define an underlying instance of the 
2-SUM problem. Based on these subsets, we construct two n-dimensional matrices A\ and A 2 and 
the matrix sum A := A\ + A 2 ; we then argue that any algorithm that can estimate the generalized 
matrix rank of A can solve the underlying 2-SUM problem. 

The reduction consists of the following steps. First, Alice constructs a matrix X of dimensions 
rL x n as follows. For each i € {1,..., r} and j € {1,..., L}, define t(i,j) = (i — 1 )L + j, and let 
denote the associated row of X. Letting et(i,j) € M n denote the canonical basis vector (with 
a single one in entry t(i,j)), we define 




e t(i,j) if 3 € Ui 

0 otherwise. 


Second, Bob constructs a matrix Y of dimensions rLxn following the same rule as Alice, but using 
the subset (V \,..., V£) in place of (Ui ,..., Ul)- Now define the n x n matrices 


Ai C2^X t X + ^ e r L+ieJ L+l ^j 
i=l 


and A 2 := c 2 (y t Y + ^ e rL+i eJ L+l ^. 

i= 1 


With these definitions, it can be verified that ||A||2 < 2c2 < 1, and moreover that all eigenvalues of 
A are either equal to 2c2 or at most C2. Since c\ < 2c2, the quantities rank(A,ci) and rank(A,C2) 
are equal, and equal to the number of eigenvalues at 2c2- The second term in the definition of 
A\ and A 2 ensures that there are at least r eigenvalues equal to 2c2- For all (i,j) pairs such that 
j € 1/j (1 Lj, the construction of X and Y implies that there are two corresponding rows in X and 
Y equal to each other, and both of them are canonical basis vectors. Consequently, they create a 
2 c 2 eigenvalue in matrix A. Overall, we have rank(A, ci) = rank(A,C2) = r + \U% 0 V*|, Since 

the problem set-up ensures that |Ui 0 U| < 1, we conclude r < rank(A,ci) < 2 r. 


Now suppose that there is a randomized algorithm estimating the rank of A such that 
(1 — (5)rank(A, ci) < r(A) < (1 + <5)rank(A, C2). 

Introducing the shorthand s := Yln=\ \U. 0 V):|, when 5 = l/\/r, we have 

r + s - (r + s)/y/r < r(A) < r + s + (r + s)/y/r. 

Thus, the estimator r{A) — r computes s up to additive error {r+s)/\/r, which is upper bounded by 
2 y/r. It means that the rank estimation algorithm solves the 2-SUM problem. As a consequence, the 
randomized communication complexity of the rank estimation problem is lower bounded by U(rL) = 
f2(n). 


5 Discussion 

In this paper, we have studied the problem of estimating the generalized rank of matrices. Our 
main results are to show that in the deterministic setting, sending 0(n 2 ) bits is both necessary and 
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sufficient in order to obtain any constant relative error. In contrast, when randomized algorithms 
are allowed, this scaling is reduced to 0(n). 

Our work suggests an important problem, one whose resolution has a number of interesting 
consequences. In the current paper, we establish the 0(n) scaling of communication complexity 
for achieving a relative error 5 = 1/y/r where r is the matrix rank. Moreover, Algorithm 2 does 
not guarantee higher accuracies (e.g., 5 = 1/r), and as discussed in Section 3.2.3, it is unknown 
whether the f l(n) lower bound is tight. The same question remains open even for the special case 
when all the matrix eigenvalues are either greater than constant c or equal to zero. In this special 
case, if we were to set ci = c and C 2 = 0 in Algorithm 2, then it would compute ordinary matrix 
rank with relative error 5 = 1 /\/r. Although the problem is easier in the sense that all eigenvalue 
are promised to lie in the subset {0}U (c, 1], we are currently not aware of any algorithm with 0(n) 
communication cost achieving better error rate. On the other hand, proving a tight lower bound 
for arbitrary 6 remains an open problem. 

The special case described above is of fundamental interest because it can be reduced to many 
classical problems in linear algebra and convex optimization, as we describe here. More precisely, if 
there is an algorithm solving any of these problems, then it can be used for computing the matrix 
rank with relative error <5 = 0. On the other hand, if we obtain a tight lower bound for computing 
the matrix rank, then it implies a lower bound for a larger family of problems. We list a subset of 
these problems giving a rough intuition for the reduction. 

To understand the connection, we begin by observing that the problem of rank computation 
can be reduced to that of matrix rank testing, in which the goal is to determine whether a given 
matrix sum A := + • • • + A m has rank at most r — 1, or rank at least r, assuming that all 

eigenvalues belong to {0} U (c,+oo). If there is an algorithm solving this problem for arbitrary 
integer r < n, then we can use it for computing the rank. The reduction is by performing a series 
of binary searches, each step deciding whether the rank is above or below a threshold. In turn, the 
rank test problem can be further reduced to the following problems: 

Singularity testing: The goal of singularity testing is to determine if the sum of matrices B := 
B\ + • • • + B m is singular, where machine i stores the PSD matrix Bi. Algorithms for singularity 
testing can be used for rank testing. The reduction is by using a public random coin to generate a 
shared random projection matrix Q E R rxri on each machine and then setting Bi := QAiQ 1 . The 
inclusion of the public coin only increases the communication complexity by a moderate amount [18], 
in particular by an additive term 0(log(n)). On the other hand, with high probability the matrix 
A has rank at most 7™ — 1 if and only if the matrix B is singular. 

Solving linear equations: Now suppose that machine i stores a strictly positive definite matrix 
Ci and a vector y. The goal is to compute the vector x satisfying Cx = y for C := C\ + ■ ■ ■ + C m . 
Algorithms for solving linear equations can be used for the singularity test. In particular, let 
Ci := Bi + XI and take y to be a random Gaussian vector. If the matrix B is singular, then 
the norm ||x ||2 —> oo as A —>• 0. Otherwise, it remains finite as A —>• 0. Thus, we can test for 
A = 1, . to decide if the matrix is singular. Note that the solution need not be exact, since 

we only test if the t^-norm remains finite. 

Convex optimization: Suppose that each machine has a strictly convex function /,, and the 
overall goal is to compute a vector x that minimizes the function x >-)• f{x) := fi(x) + • • • + / m (x). 
The algorithms solving this problem can be used for solving linear equations. In particular, for a 
strictly positive definite matrix Q, the function fi(x) := \x T CiX — -^y T x is strictly convex, and 
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with these chocies, the function / is uniquely minimized at C 1 y. (Since the linear equation solver 
doesn’t need to be exact, the solution here is also allowed to be approximate.) 


This reduction chain suggests the importance of studying matrix rank estimation, especially for 
characterizing lower bounds on communication complexity. We hope the results in this paper are 
a meaningful first step in exploring this problem area. 
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A Proof of Lemma 1 

The function q 2 is monotonically increasing on [0,1]. In addition, we have 52 ( 0 ) = 0 and < 72 ( 1 ) = 1, 
and hence ( 72 ( 2 ) € [0,1] for all 2 € [0,1]. Let us refine this analysis on two end intervals: namely, 
z € [—0.1, 0.1] and z € [0.9,1.1]. For 2 € [—0.1, 0.1], it is easy to observe from the definition of (72 
that <72(2) > 0. Moreover, for z € [—0.1,0.1] we have \z(l — z)\ < 0.11. Thus, 

= fo t p (l ~ t) p dt < J 0 z tP(l-trdt < 0.1 x (0.11)P 2 _ p _ 

2 ~ /q 1 tP(l - t)Pdt ~ / 0 0 4 V(1 - t)Pdt ~ 0.2 x (0.24)P 

The function <72 is symmetric in the sense that q 2 (z) + < 72(1 — z) = 1. Thus, for 2 € [0.9,1.1], we 
have q 2 (z) = 1 — < 72(1 — z) € [1 — 2~ p , 1], In summary, we have proved that 

0 < q 2 {z) < 1 for 2 G [-0.1,1.1], (23a) 

q 2 (2) < 2 - p for 2 € [-0.1,0.1], (23b) 

q 2 (z) >1-2 - p for 2 € [0.9,1.1]. (23c) 

By the standard uniform Chebyshev approximation, we are guaranteed that qi(x) € [—0.1,1.1] 
for all x € [0,1]. Thus, inequality (23a) implies that q 2 (qi(x)) € [0,1] for all x € [0,1], If x € [ 0 , 02 ], 
then q\(x) € [—0.1, 0.1], and thus inequality (23b) implies q 2 (qi (x)) < 2~ p . If x € [ci,l], then 
qi(x) G [0.9,1.1], and thus inequality (23c) implies q 2 (qi(x)) > 1 — 2~ p . Combining the last two 
inequalities yields that 

\q 2 (qi(x)) - H C1}C2 (x) I < 2~ p for all x € [0,c 2 ] U [ci, 1]. 

B Proof of Lemma 3 

Let qt be the t -th row of Q 2 , and let Q® € M r+ * be the matrix whose first r rows are the rows 
of Qi, and its remaining t rows are < 71 ,... ,qt■ Let qf, , be the projection of q t +1 to the subspace 
generated by the rows of and let qh 1 := qt+i — g|, 1 . We have 

(g(t+ 1 ))TgP+ 1 ) = (q(O)'Tq(O +q T qt = (q( < )) t qW + (qf +1 ) T qf +1 + (q^ +1 ) T q^ +1 

— ( Q {t) fQ {t) + (^ + i)V+i- 
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This inequality yields the lower bound 

r 

Q 1 Q 1 + QlQ2>i QlQi + (24) 

t =i 

where >z denotes ordering in the positive semidefinite cone. Note that the rows of Q i and 

are mutually orthogonal. To prove that the ^-th largest eigenvalue of Q 1 Q 1 +Q 2 Q 2 is greater than 

1/10, it suffices to prove that there are at least r/5 vectors in {q^} r t= i which satisfy ||<z /"||2 > 1/10. 

Let 5i be the linear subspace generated by q\,... ,qt.-i and let 5;/ be its orthogonal subspace. 
The vector qt is uniformly sampled from a unit sphere in . Let S 2 be the linear subspace 
generated by the rows of Q . Since has r +1 — 1 rows, the subspace has at most r +1 — 1 

dimensions. Without loss of generality, we assume that S 2 has r +1 — 1 dimensions (otherwise, we 
expand it to reach the desired dimensionality). We let 5^ be the orthogonal subspace of £ 2 - By 
definition, q^ is the projection of qt to 5^ (or a linear space that contains if the subspace S 2 
has been expanded to reach the r + t—1 dimensionality). Let q[ be the projection of qt to 
then we have 


U\\l > 


./112 
2 - 


(25) 


Note that 5/- is of dimension n — t + 1 and 5/- is of dimension n — r — t + 1. Thus, the dimension 
of Si n 5^ is at least n — r — 2t + 2. Constructing q' t is equivalent to projecting a random vector 
in the (n — t + l)-dimension sphere to a (n — r — 2t + 2)-dimension subspace. It is a standard 
result (e.g. [9, Lemma 2.2]) that 


P 


M</3 


n — r — 2t + 2 


n — t + l 


< exp 


n — r — 2t + 2 


( 1-/3 + 


for any j3 < 1. 


Setting /3 = 0.3 and using the fact that t < r < n/ 4, we find that 


HWl < 1/10 


< exp 


n — n/4 — n/2 + 2 


(1 — 0.3 + log(0.3))^ < exp(—n/16). 


(26) 


Defining the event £t := {||^||| < 1/10}, note that inequality (26) yields P[^] < exp(—n/16). Since 
q[ is the projection of a random unit vector to a subspace of constant dimension, the events {£/}* =1 
are mutually independent, and hence 


at least ^ events in {£j}* =1 occur 


< 


r 

4r/5 


(exp(—n/16)) s < exp 


< exp ( — 


r log(r) 

5 

3 rn\ 
100 /’ 


rn\ 
~ 20 / 


where the last inequality follows since any integer r satisfies log(r) < Thus, with proba¬ 

bility at least 1 — exp(—f^), there are at least r/5 rows satisfying \\q ' t \\2 > 1/10. Combining this 
result with inequality (24) and (25) completes the proof. 


C Proof of Lemma 4 

The claimed facts about sub-exponential random variables are standard [3] , but we provide proofs 
here for completeness. 
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Part (a): Let Z be x 2 variable with one degree of freedom. Its moment generating function takes 
the form 

E[exp (t(Z — 1))] = (1 — 2i)~ 1 / 2 e~* for t < 1/2. 

Some elementary algebra shows that (1 — 2f) -1 / 2 e -t < e 2t2 for any t € [—1/4,1/4]. Thus, we have 
E[exp(t(Z — 1))] < e 2 * 2 for |t| < 1/4, verifying the recentered variable X = Z — 1 is sub-exponential 
with parameter (4,4). Also by the moment generating function of Z. we have 

E[exp(f(l — Z))\ = (1 + 2:t) _ 1 / 2 e* for t > —1/2. 

Replacing t by —t and comparing with the previous conclusion reveals that 1 — Z is sub-exponential 
with parameter (4,4). 


Part (b): Suppose that Z\,, Z n are independent and Z % is sub-exponential with parameter 
(< t 2 , /3i). By the definition of sub-exponential random variable, we have 


E 


exp 


t ^ ] oil Z\ 

j=i 


J^E[exp (tatZi)] < exp((fa;) 2 o- 2 / 2 ) = exp 

i=1 i —1 




for all t < maxj e [ n j{l/(aj/3i)}. This bound establishes that Yli=i sub-exponential with 

parameter ( 5 ^?=! a i a i > max ie[n]{ Q: i/^i})) as claimed. 


Part (c): Notice that P [Z > t] = ¥[e xz > e xt ] with any A > 0. Applying Markov’s inequality 
yields 


P[Z > t] < EleXP A ( , AZ|1 < exp 



for A < 1//3, 


where the last step follows since Z is sub-exponential with parameter (<r 2 ,/3). Notice that the 
minimum of —A t + occurs when A* = t/a 2 . Since t < a 2 / /3, we have A* < 1//3, verifying the 

validness of A*. Plugging A* in the previous inequality completes the proof. 
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